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Abstract 


This  paper  is  concerned  with  the  nonparametric  estimation  of  a distribution 


function  F,  when  the  data  are  incomplete  due  to  grouping,  censoring  and/or 

truncation  .“^Subsets  ‘ ' *®N  of  the  real  line  are  given  and  there  are 

N independent  observations  X ,X  ,...,X  , where  X.  is  drawn  from  the 
\ 1 2 N 1 


truncated  distribution  F(x;B.)  = 


*i(  ^However  X^  may  not  be 


observed  exactly  and  is  known  only  to  lie  in  the  set  A^  C B^^The  situation 
occurs  frequently  in  survivorship,  reliability, and  recidivism  analysis.  Using 
the  idea  of  self-consistency,  a simple  algorithm  is  constructed  and  shown  to 
converge  monotonically  to  yield  a maximum  likelihood  estimate  of  F.  The 
procedure  compares  favourably  with  the  more  cumbersome  Newton-Raphson  method. 
A test  is  proposed  for  comparing  two  distributions  when  data  on  one  or  both 
is  incomplete  and  some  other  applications  of  the  empirical  distribution 


function  are  indicated. 


(eywords : EMPIRICAL  DISTRIBUTION  FUNCTION;  CENSORING;  INTERVAL  CENSORING, 

TRUNCATION,  GROUPING;  SURVIVAL  CURVE;  MAXIMUM  LIKELIHOOD;  SELF- 
CONSISTENCY;  NEWTON-RAPHSON;  MULTINOMIAL  DISTRIBUTION;  TWO  SAMPLE 
TEST;  LOGRANK  TEST;  LEHMANN  ALTERNATIVES 
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I.  INTRODUCTION 

In  this  paper  we  will  be  mainly  concerned  with  the  nonparametric 

estimation  of  the  distribution  F of  a real  valued  random  variable  X, 

when  the  sample  data  are  incomplete  due  to  restricted  observation  brought 

about  by  grouping,  censoring  and/or  truncation.  More  precisely  the  situation 

is  as  follows.  Subsets  of  the  real  line  are  given  and  there 

are  N independent  observations  X = x, , ,X.,  = xVI,  where  X.  (1  < i < N) 

is  drawn  from  the  truncated  distribution  F(x;B.)  = P(X  < x|X  e B.),  x c B.. 

i — l l 

Thus  X^  is  truncated  by  or,  in  other  words,  the  experimenter  would 

not  have  been  aware  of  the  existence  of  that  observation  had  X.  not  belonged 

to  B^.  Moreover  X^  (1  <_  i <_  N)  may  not  be  observed  exactly  and  is  known 

only  to  lie  in  the  set  A.  where  A.  C B..  Thus  X.  is  censored  into  the 

i l—i  l 

set  A^.  Grouped  data  can  be  naturally  considered  as  censored,  where  each 
observation  is  censored  into  one  of  a fixed  collection  of  disjoint  sets. 

The  observed  data  are  then  the  N pairs  (A-.B,),  (A„,B„),. . . ,(A„,B  >. 

The  truncating  sets  {B^}  can  either  be  viewed  as  fixed  or  as 
random.  We  can  now  think  of  a partition  of  the  set  and  A^  is  that 

member  of  the  partition  into  which  X^  falls.  Again  the  partition  can  be 
viewed  either  as  fixed  or  as  having  arisen  from  some  random  mechanism 
independent  of  X^.  In  many  cases,  the  partition  of  B..  will  be  unknown 
(except  for  the  fact  that  A^  belongs  to  it);  these  assumptions  will  make 
knowledge  of  the  partition  irrelevant  to  the  estimation  of  F.  The  case  of 
grouped  data  can  be  considered  as  one  in  which  the  partitions  are  known  and 
are  the  same  for  each  i ( 1 <_  i <_  N ) . 

If  B^  = (-«,•)  then  is  not  truncated,  and  if  A^  consists  of  a 

single  point  then  X^  is  uncensored,  i.e.  is  exact.  We  say  that  X^  is 
interval  censored  if  A^  is  of  the  form  [L^,FL]  and  X^  is  right  ( left ) 
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censored  if  R.  = +®  (L.  = -“).  Of  course  if  L.  = R.,  then  X.  is 

11  111 

exact.  Interval,  right  and  left  truncation  are  defined  similarly.  A sample 
is  said  to  be  singly  censored  if  all  the  data  is  either  exact  or  right 

censored;  and  doubly  censored  if  the  data  is  all  either  exact,  right  censored 

or  left  censored.  This  is  now  standard  terminology. 

Examples  of  right  censoring  are  very  common  e.g.  in  medical  follow-up 

and  industrial  life-testing  situations.  Interval  censoring  occurs  naturally  when 

the  {X.}  represent  response  times.  Let  us  suppose  that  periodic  inspections  are 

made  at  times  t.  < t.  <...  <t  in  order  to  see  whether  a certain  event  has 
12  m 

yet  happened.  If  it  has  already  occurred  by  the  first  inspection,  the 
observation  is  left  censored  in  (-“,t^3.  If  the  response  is  first  observed  to 
have  occurred  at  the  k'th  inspection  (2  k m)  then  the  observation  is 

censored  into  (t^  ^,t^3,  while  if  the  event  has  still  not  happened  by  the 

last  inspection,  the  observation  is  right  censored  in  (t^,00).  Examples  of 
interval  censoring  are,  for  instance,  described  in  Harris,  Meier  and  Tukey 
(1950),  Cohen  (1957),  Hartley  (1958),  Gehan  (1965),  Peto  (1973)  and  Turnbull 
(1974).  Also  the  bioassay  problem  discussed  by  Ayer  et  al  (1955)  can  be 
considered  an  extreme  case  of  double  censoring  when  there  is  no  exact  data. 

The  same  situation  arises  in  the  estimation  of  gap  acceptance  distributions  in 
traffic  studies  (see  Miller,  A.J.  (1974).). 

In  most  practical  situations,  each  set  A^  will  be  an  interval  or  a 
point.  However  the  problem  is  not  made  much  more  complex  if  we  allow  the 
{A^}  to  be  unions  of  intervals  and  points.  This  could  arise  if,  in 
grouped  data,  non-ad jacent  groups  had  been  pooled;  for  example,  readings  off 
the  scale  of  the  measuring  instrument  whether  too  high  or  too  low  might  have 
been  pooled.  This  more  general  type  of  censoring  pattern  has  also  been 
considered  by  Mantel  (1967). 

Truncation  can  occur  if  the  population  from  which  is  drawn  has  been 
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r.ubiect  to  some  screening  procedure  in  which  all  items  with  x-values  outside 
B.  have  been  removed.  This  situation  can  arise  in  consumer  product  testing, 
for  example.  If  several  data  sets  have  been  pooled  to  produce  the  sample  then 
the  { EK } will  not  necessarily  all  be  identical.  Another  example  of  truncation 
occurs  when  the  instrument  which  is  measuring  X needs  a certain  minimum 
level  before  it  will  respond  at  all. 

Concerning  survivorship  analysis.  Mantel  (1966)  mentions  left  truncation 
in  the  context  of  merging  clinical  trials.  Here  a group  of  survivors  at  a 
certain  point  in  time  is  to  be  incorporated  into  ongoing  study  data  when  the 
original  size  of  the  group  of  which  these  are  a remnant  is  unknown.  The 
reentry  problem,  also  suggested  by  Mantel,  is  an  example  where  there  can  be  a 
more  general  truncation  pattern.  This  situation  occurs  when  a person  can  be 
lost  to  follow-up,  by  leaving  a health  insurance  programme  for  instance,  but 
then  he  rejoins  at  a later  date.  If  he  had  died  in  the  intervening  interval 
we  would  not  have  been  aware  of  it.  Here  is  of  the  form  (-w.b^]  U [b2,<»), 

but  one  could  envisage  a more  general  situation  where  a person  could  enter  or 
leave  the  programme  several  times.  Of  course,  with  some  effort,  we  may 
uncover  information  about  an  individual  who  might  otherwise  be  lost.  However, 
not  only  will  this  be  expensive  but  it  could  also  introduce  a bias  if  the 
success  of  the  search  is  influenced  by  whether  or  not  death  has  occurred.  Thus 
an  unbiased  incomplete  (truncated)  sample  may  be  preferable  to  complete  but 
biased  data.  (Another  difficulty  is  to  ensure  that  the  person  has  not 
rejoined  because  his  health  has  deteriorated.  This  would  violate  our  assumptions. 
We  might  refer  to  this  situation  as  "prognostic  truncation".) 

The  problem  of  the  estimation  of  F when  some  parametric  form  for  F 
is  assumed  has  been  treated  extensively  in  the  literature.  Early  work  has  been 
summarized  by  Buckland  (1964,  Ch.  2).  For  example,  the  case  when  F is  normal 
has  been  considered  by  Cohen  (1957),  while  recently  Selvin  (1974)  has  examined 
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the  Poisson  case.  Blight  (1970)  has  developed  a general  method  for  obtaining 
the  maximum  likelihood  estimates  of  the  parameters  for  any  distribution  in  a 
multiparameter  exponential  family.  (See  also  Hartley  and  Hocking  (1971)  and 
Sundberg  (1974).)  Most  authors  have  assumed  that  the  sets  {B^}  are  intervals, 
the  same  for  each  i,  and  that  the  observations  are  either  exact  or  censored 
into  one  member  of  a fixed  set  comprising  several  disjoint  intervals. 

We  shall  be  concerned  with  deriving  the  maximum  likelihood  estimate  (MLE) 
of  F when  no  parametric  assumptions  about  its  form  are  assumed.  Of  course, 
if  all  the  data  are  exact  with  no  truncation,  this  estimate  is  given  by  the 
empirical  (sample)  c.d.f.  When  the  data  is  subject  only  to  right  censoring, 
which  is  common  in  survivorship  and  life-testing  situations,  Kaplan  and 
Meier  (1958)  have  shown  that  the  MLE  of  F is  given  by  the  product  limit 
(PL)  method.  This  can  be  adapted  to  accommodate  left  truncation  as  well  by 
treating  such  data  as  "negative  losses"  (see  p.  463).  Trivially,  by 
reversing  the  scale,  the  PL  method  can  be  applied  to  data  subject  only  to 
left  censoring  and  right  truncation.  It  can  also  be  used  in  problems  with  no 
truncation  and  very  special  patterns  of  double  censoring  (Turnbull  (1974, 
p.  170))  and  of  interval  censoring  (Peto  (1973,  p.  87)).  Explicit  estimates 
are  also  available  for  certain  particular  interval  truncation  patterns  with  no 
censoring  (see  Section  5). 

For  obtaining  estimates  in  more  general  situations,  explicit  solutions 
of  the  likelihood  equations  are  not  available  and  iterative  methods  must  be 
used.  For  interval  censored  data,  Peto  (1973)  employed  direct  but  rather 
cumbersome  Newton-Raphson  search  methods  to  maximise  the  likelihood.  Turnbull 
(1974)  used  the  idea  of  self-consistency  (cf.  Efron  (1967))  to  obtain  a simple 
algorithm  in  the  doubly  censored  case. 


The  purpose  of  this  paper  is  threefold.  Firstly,  a simple  iterative 
procedure  is  proposed  for  finding  the  MLE  of  F for  the  general  case  of 
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arbitrarily  censored  and  truncated  data.  This  method  can  be  considered  the 
nonparametric  analogue  of  that  of  Blight  (197^).  Secondly  a new  method  of  proof 
of  the  equivalance  of  self-consistency  and  maximum  likelihood  is  presented. 

This  method  utilizes  the  relation  between  the  values  of  the  likelihood  and 
successive  approximations  of  the  estimates,  giving  at  the  same  time  some 
insight  as  to  why  the  theorem  should  be  true.  The  proof  differs  from  the  rather 
inelegant  and  lengthier  arguments  used  previously  for  the  easier  special  cases 
of  single  censoring  (Efrcn  (1967,  Thm  7.1))  and  of  double  censoring  (Turnbull 
(1974)).  Finally,  the  algorithm  is  shown  to  converge,  and  in  a monotone 
fashion,  a fact  conjectured  by  Turnbull  (1974)  on  the  basis  of  empirical 
evidence . 

In  Section  2,  the  likelihood  function  is  examined,  and  the  problem 
reduced  to  a simpler  one  of  estimating  the  parameters  of  a multinomial 
distribution  with  censoring  and  truncation.  In  Section  3,  the  self-consistency 
algorithm  is  described  and,  in  the  following  section,  is  shown  to  converge  to 
yield  the  MLE  of  F.  In  Section  5,  properties  of  the  algorithm  are  discussed 
and  comparisons  made  with  the  Newton-Raphson  method.  A two  sample  test  when 
one  or  both  samples  may  be  subject  to  censoring  and  truncation  is  proposed  in 
Section  6.  Finally,  some  further  problems  such  as  large  sample  properties  of 
the  estimates  and  the  handling  of  concomitant  variables  are  discussed. 
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2.  REDUCTION  OF  THE  PROBLEM 

We  first  show  that  the  maximum  likelihood  estimate,  F,  of  F increases 
in  only  a finite  number  of  disjoint  intervals  (or  points).  We  shall  use 
the  same  notation  as  Peto  (1973)  who  obtained  a similar  result  for  interval 
censoring  with  no  truncation. 

Let  us  assume  that  each  (1  <_  i <_  N)  can  be  expressed  as  the  finite 

union  of  disjoint  closed  intervals,  with  the  convention  that  an  isolated 
point  (x)  is  a closed  interval  [x,x]  and  that  a semi-infinite  interval 
is  semi-closed  only.  Thus  we  can  write 


A. 

i 


k. 

l 

U 

j=l 


•V 


(i=l,2,. . . ,N)  , 


where  -®  < L..  < R..  < L.0  < ...  < L.„  < R,,  < « and  R . , > -«, 

— ll  — ll  i2  - iK.  — lk.  — ll  * 

l l 

< 00  . From  a practical  point  of  view,  this  restriction  on  the  form 

i 

of  A.,  is  unimportant.  We  now  construct  a set  of  disjoint  intervals  whose 
left  and  right  end  points  lie  in  the  set  {L„;  1 <_  j k^,  1 < i < N} 
and  {Rj_,  1 j <_  k^,  1 i N)  respectively,  and  which  contain  no  other 
members  of  {L„}  or  {R„ } except  at  their  end  points.  We  write  these 
intervals 


Cq1»p13,  Cq2,P2],. 


’CVPm] 


where  q.  < p,  < q„  < . . . < q_  < p . Also  define 
i—l  2—  m — m 


c = U [q , *Pj  ] . 
j=l  ^ 3 


(2.1) 


For  example  in  the  case  of  single  censoring,  we  have  k^  = 1 for  all 
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i,  and  L.,  = R.,  if  x.  is  exact  while  R.,  = if  x.  is  right 

il  ll  1 ll  i r' 

censored.  Let  u (1  < u < N)  be  defined  by  L , = max  L.  < °°  . If 

- - ul  i il 

L , = R , (i.e.  the  largest  observation  is  uncensored),  then  m is  the 

ul  ul 

number  of  exact  observations  and  = p is  the  value  of  the  j'th  largest 

exact  observation.  If  R = +“  (i.e.  the  largest  observation  is  censored), 

then  m-1  is  the  number  of  exact  observations,  the  last  interval  [q^,p  3 

is  [L  ,,<»),  and  q.  = p.  (l<j<m-l)  are  the  values  of  the  exact 
ul  11  — — 

observat ions . 

Under  the  assumptions  of  Section  1,  the  likelihood  is  proportional  to 

N 

L*(F)  = n [P„(A . )/P  ( B . ) ] 

. , F l F i 
i = l 


N 1 

= n { l [F(R. . + ) - F(L. . -)]}/P_(B.  ) . (2.2) 

i=l  j=l  1]  F 1 

We  will  assume  that  P„(UB.)  = 1,  which  occurs  for  instance  if  at  least 

F l 

one  observation  is  not  truncated.  The  search  for  that  function  F that 
maximises  (2.2)  is  facilitated  by  the  following  lemmas. 

Lemma  1 . 

Any  c.d.f.  which  increases  outside  the  set  C cannot  be  a maximum 
likelihood  estimate  of  F,  except  in  the  trivial  case  when  A^  fi  C = B.  fi  C 
for  all  i. 

Proof.  Recall  that  A^  C B^  and  C is  defined  by  (2.1).  Suppose  that 

c.d.f.  G assigns  non-zero  probability  p to  the  set  A^  - C for  some  i. 

Then  the  likelihood  can  be  strictly  increased  by  "transferring"  probability 

p from  A.  - C to  A.  f)  C.  Similarly  if  G-  assigns  positive  probability 
1 1 N 

c 

to  a set  B.  - A.  - C or  to  fi  B.  , the  likelihood  can  be  improved  by 
X x • _ « x 

1=1 
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"transferring"  probability  from  these  sets  to  C.  This  improvement  is 
again  strict  except  in  the  trivial  case  mentioned. 


Remark.  When  A.  DC  = B.  f!  C for  all  i,  the  maximum  likelihood  is 
1 1 

unity  and  is  achieved  by  any  distribution  which  assigns  zero  probability 
N 

to  U B.  - C.  The  situation  represents  one  of  severe  censorship  and 
truncation;  we  will  exclude  such  cases  from  our  further  discussion. 


Lemma  2 . 

For  fixed  values  of  F(p^+),  F(q  -)  (1  <_  j <_m),  the  likelihood 
is  independent  of  the  behaviour  of  F within  each  interval  [q_.»p.]. 
The  proof  is  obvious. 

Now,  for  1 < j < m,  define 


s.  = F(p . + ) - T(q . - ) . 
3 3 3 


(2.3) 


Then  the  vectors  s = (s,,...,s  ),  where  Ys.  = 1 and  s.  >0, 

'v  1 m L j 3 ~ 

define  equivalence  classes  on  the  space  of  distribution  functions  F 
which  are  flat  outside  C.  We  will  S3y  that  two  such  functions  are  equiv- 
alent if  they  have  the  same  ^-vectors,  as  defined  by  (2.3).  All  functions 
in  the  same  equivalence  class  will  have  the  same  likelihood  by  Lemma  2, 
and  Lemma  1 shows  that  we  can  restrict  our  search  for  an  MLE  to  these  classes. 
Therefore  the  MLE  will,  at  best,  be  unique  only  up  to  equivalence  defined 
in  this  way. 

For  example,  for  right  censored  data,  the  Kaplan-Meier  PL  estimate 
is  undefined  at  the  exact  observation  points  and  in  an  interval  [L,“>), 
say,  if  the  largest  observation  is  at  L and  is  censored.  Of  course  one 
can  obviate  the  ambiguity  when  p^  = q^  by  requiring  F to  be  right 
continuous. 
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The  foregoing  discussion  shows  the  problem  of  maximising  (2.2) 
reduces  to  one  of  maximising 

N m m 

L*  ( s , . . . , s ) = n(J  a . . s . / V B . . s . ) , (2.4) 

1 m i=l  j = l ^ 1 j=l  ^ 1 

subject  to  £s.  = 1,  s_.  >_  0 (l<_"j<_m),  where 

if  [q. ,p. ] C A. , 

3 3 - i 

otherwise  , 

if 

otherwise  . 

We  remark  that  we  would  be  able  to  write  down  (2.4)  immediately  as 

the  likelihood  if  there  were  a discrete  scale  for  X (i.e.  X could  only 

take  on  values  t,,t„,...,t  , say).  Then  we  would  define  s.  = P(X  = t.)  . 

12m  3 3 

This  was  the  situation  in  the  double  censoring  problem  considered  by 

Turnbull  (1974),  in  which  it  was  required  to  estimate  the  probabilities  that 

a certain  response  time  fell  in  the  first  month,  the  second  month,  etc. 

Now  since  A.  C B.  for  all  i,  we  have  that  a..  =1  implies  S..  = 1. 

i-i  3-3  ID 

s = (s  s ) denote  a value  of  s for  which  L*  attains  its  maximum 

K 1*  ’ m 'v 

in  the  region  R = {^|£  s^  = 1,  s_.  >_  0 (1  <_  j <_  m)}  . We  assume  that  neither 

of  the  following  two  trivial  situations  hold: 

(A)  There  exist  j ,k  with  1 <_  j ,k  < m and  j i k such  that 

a..  = a.,  for  all  i (l<i<N). 

13  ik  — — 

(B)  There  exists  a subset  D such  that  far  each  i,  1 < i < N,  either 

B.  (1C  CD  or  B.  (1  C C DC. 
l - i — 


Let 
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If  (A)  occurs,  L*  depends  on  s.  and  s only  through  their  sum.  In 

1 k 

case  (B)  only  the  ratio  s./(),  ^s,  ) is  estimable  for  i e D and  hence 

j ‘‘keD  k 

s.  is  defined  only  up  to  a multiplicative  constant.  (Condition  (B) 
modifies  a result  of  Asano  (1965,  Thm.  5)  concerning  necessary  and 
sufficient  conditions  for  the  estimability  of  multinomial  probabilities 
with  truncated  data.) 


If  either  (A)  or  (B)  occurs,  s is  not  unique  and  the  maximum  likelihood 
estimate  F will  be  determined  only  as  far  as  belonging  to  a certain  union 
of  equivalence  classes. 
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3.  THE  SELF-CONSISTENCY  ALGORITHM 

In  this  section,  we  describe  an  algorithm  for  obtaining  the  MLE 
of  =>  based  on  the  equivalence  between  the  property  of  maximum 
likelihood  and  that  of  self-consistency.  This  latter  property  will  be 
defined  precisely  below;  it  is  an  extension  of  the  idea  first  used  by 
Efron  (1967)  for  right  censored  data  and  later  by  Turnbull  (1974)  for 
doubly  censored  data.  The  algorithm  is  related  to  one  proposed  by 
Hocking  and  Oxspring  (1971)  for  multinomial  data  subject  to  censoring 
without  truncation. 

For  1 < i < N,  l<j<m,  let 


1..  = 
13 


if  x£  e C q j *Pj  3 
otherwise 


Because  of  the  censoring  the  value  of  may  not  be  known,  however 

its  expectation  is  given  by 


m 

E [I..]  = a..s./  7 
si]  13  " L 


a.,  s, 
1 k=l  lkk 


* » say* 


(3.1) 


Thus  represents  the  probability  that  the  i'th  observation  lies 

in  [q^  ,p^  ] when  F belongs  to  the  equivalence  class  defined  by 

s = (s,,...,s  ).  Also,  because  of  the  truncation,  each  observation 
A*  1 m 

X.  = x^,  can  be  considered  a remnant  of  a group,  the  size  of  which  is 
unknown  and  all  (except  the  one  observed)  with  x-values  in  B^.  (They  can 
be  thought  of  as  X^s  "ghosts".)  Let  be  the  number  in  the  group 


t 


Of 


corresponding  to  the  i'th  observation  which  have  values  in  [q.,p.]. 
course  J..  is  unknown  but  its  expectation,  under  s,  is  given  by 


E (J. . ) 

* ^ 


(1  - e..)s./ 

JO  1 


m 


I 

k=l 


6.,  s 

lk  k 


(3.2) 


= v..(s)  , say. 

l]  -v  J 


If  we  treated  (3.1),  (3.2)  as  observed  rather  than  expected  frequencies,  the 
proportion  of  observations  in  interval  [q.,p.]  is 


(3.3) 


say,  where 


N m 

M(*}  = ih  j h + V*)]  * 

Note  that  M(s)  >_  N with  equality  if  there  is  no  truncation  for  then 
v„  = 0 for  all  i,j  . We  say  that  the  vector  of  probabilities  ji  is 
self-consistent  if 


s.  = n.(s,,...,s  ) (1  < i < m)  . 

] ] 1*  m - - 


(3.4) 


A self-consistent  estimate  (s.c.e)  of  ^ is  defined  to  be  any  solution  of 
the  simultaneous  equations  (3.4).  The  form  of  (3.4)  immediately  suggests  an 
iterative  procedure  for  finding  the  solution. 

A.  Obtain  initial  estimates  (1  j <_  m).  This  can  be  any  set  of 


13 


J 


positive  numbers  summing  to  unity,  e.g.  = 1/m  for  all  j. 

B.  Evaluate  jj..(s°)  and  v..(s°)  for  1 < i < N and  1 < j < m, 

1 j 'V  1]  *v  — — — J — ’ 

and  hence  M(s°)  and  n.(s°). 

j 'V 

C.  Obtain  improved  estimates  s|  by  setting 


1 / Os  c , 

s,  = it . ( s ) for  1 < j < m. 
j 3 ^ - - 


1 . 0 

D.  Return  to  Step  B with  s replacing  s , etc. 

E.  Stop  when  the  required  accuracy  has  been  achieved. 

k k-1 

(E.g.  the  rule  may  be  to  stop  when  max^j<m|s  - s^  | < 0.0001,  say. 
Alternatively  a stopping  rule  may  be  based  on  the  difference  between  successive 
values  of  the  likelihood.) 

The  procedure  is  easy  to  programme  on  a computer,  requiring  only 

k 

simple  operations.  If  any  component  of  s is  small  then  it  is  possible  for 
M(s  ) to  become  very  large.  However,  rounding  errors  can  be  avoided  if 
the  sequence  of  operations  for  computing  the  {ir_.}  is  chosen  with  care. 

Of  course,  the  difficulty  does  not  arise  if  there  is  no  truncation  for  then 
M(s  ) is  always  equal  to  N. 

Another  way  to  write  ir.(s),  which  is  useful  if  relatively  few  of  the 

3 ^ 

{A.)  and  { B^}  are  distinct,  is 


it  . ( s ) = 
3 ^ 


keA 


♦ If.u  - !»<)» 


M(s), 


(3.5) 


where  is  the  number  of  observations  censored  into  the  set  A,  qB  is 

the  number  truncated  by  the  set  B,  and  I^(j)  equals  1 if  Cq^.  *p^  3 CA 

and  is  zero  otherwise.  Thus  s In,,  = N and  using  this  relation 

A B 

M(^),  which  is  the  sum  over  j of  the  quantities  in  square  brackets  in 
(3.5),  can  be  written  more  simply  as 


14 


l CrlR(  l 


keB 


,-1, 
sk)  ]. 


Therefore  the  computations  in  Step  3 of  the  algorithm  involve  summing 

only  over  the  number  of  distinct  {A.}  and  distinct  {B.}  which  mav  be 

1 x 

considerably  less  than  N in  the  case  when  X is  discrete,  for  example. 

In  the  next  section  we  show  that  this  algorithm  converges  and  that 
self-consistent  estimates  also  maximise  the  likelihood.  In  Section  5, 
the  algorithm  is  discussed  further  and  compared  with  the  general 
Newton-Raphson  method  which  has  been  suggested  by  several  authors  in 
connection  with  various  special  cases. 


1 


'V 


I 
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4.  THE  EQUIVALENCE  AND  CONVERGENCE  THEOREM 

We  now  examine  the  equivalence  of  the  s.c.e.  and  the  m.l.e. 
from  (2.4),  we  see  that  the  log-likelihood  is  given  by 


N m m 

L(s)  = y [log(  y ( a , . s . ) - log(  y B..s.)]  (4.1) 

% i=l  j=l  13  3 j=l  13  3 


Consider  the  effect  of  increasing  a particular  component,  s_.  say, 

by  a small  positive  amount  e and  then  dividing  all  the  {s^},  including 


s^  + e,  by  1 + e in  order  to  keep  the  sum  equal  to  unity.  We  let 

d.(s)  denote  the  value  of  the  derivative  of  L with  respect  to  e at 
] % 

e = 0.  Therefore 


where  we  have  substituted  for  d.(^)  by  (4.3).  However 


N m 
M(s)  = l l 
' i=l  3=1 


a . . s . 

. JJ.J- 


(1  - B . . )s. 
13  1 


m m 

I aiksk  ^ 6iksk 

k=l  k k=l  lk  k 


N m 


--  l 8iksk) 


-1 


i=l  k=l 


Substituting  in  (4.4),  we  obtain 


d.(s) 

7T  . ( S ) = ( 1 + ffj-  -v  - ) S . 

] ^ M(s)  ] 


(1  < j < m) 


(4.5) 


Now  a necessary  and  sufficient  condition  for  s to  be  an  MLE  is  that 


for  each  i 


either  d.(s)  = 0 or  d.(s)  < 0 with  s.  = 0. 

j x — 3 <v  - 3 


(4.6) 


Thus  from  (4.5)  and  (4.6),  we  see  immediately  that  the  MLE  ^ satisfies 

w.(s)  = s.  for  all  j,  and  hence  is  self-consistent. 

3 -v  3 

Concerning  convergence  of  the  algorithm,  we  let  and  be  successive 

approximations  where,  by  (4.5),  sj  = [1  + (d^ (^)/M(s) ) ]s^  for  1 < j <_  m. 

Now  by  a Taylor  series  expansion  we  have 

L(^')  - L(s)  = ^ (s^  - Sj)  |~  + 0(|j  s'  - s ||  2) 


1 r . . . 3L 

“m<7)  aiT 

- 1 r ? - f3L  \2  /V  3L  . 2, 

- m7"t  L l Sj  (‘rr-)  - ( I Sj  rr~ ) 3 

M(^)  jr!  3 38j  j*l  3 asj 


(4.7) 


1 ^ 2 

Kip  jl,  Vi  <?>  i °- 


where  we  have  used  (4.2)  and  have  neglected  terms  of  second  and  higher 

order.  Thus  L(s')  > L(s)  with  equality  only  if,  for  each  j,  either 
x — x 

s.  = 0 or  d^(s)  = 0<  Thus  the  algorithm  converges  monotonically , at 

least  for  s°  close  enough  to  s,  so  that  higher  order  terms  can  indeed 

be  neglected.  Suppose  that  the  limiting  value  is  s.  Then  *=;  satisfies  (3.4). 

Hence  if  all  s^  > 0,  it  follows  by  (4.5)  that  dj(j=s)  = 0 far  all  j and 

s is  the  MLE  s of  s.  Suppose  then  that  s.  = 0 for  some  i and  that 

X ^ x j J 

'v  0 

d. (s)  > 0 in  some  neighbourhood  of  s.  From  the  assumption  that  s.  > 0 
lx  x r 3 

k k 

for  all  j it  follows  that  s.  > 0 and  M(s  ) < <=»  for  k = 0,1,2,...  . 

We  are  assuming  that  s eventually  lies  in  this  neighbourhood  where 


d^(^)  > 0.  However  (4.5)  implies  that  s^  cannot  decrease  any  further 

x 'V 

towards  s^  = 0,  which  is  a contradiction.  Thus  if  s..  = 0 for  some  j, 

it  follows  that  cK(s)  <_  0.  (In  fact  the  limit  of  d.(^i)  as  ^ ^ may 

. x 

not  exist  if  some  s..  = 0,  in  which  case  we  interpret  the  previous  statement, 

and  (4.6),  as  meaning  that  dj(s)  >_  0 throughout  a neighbourhood  of  ^ •) 

# % % 

A similar  argument  shows  that  d^(^)  = 0 for  any  fc  such  that  s^  > 0. 

Hence  ^ satisfies  the  condition  (4.6)  for  maximising  L and  this  completes 

the  proof  of  the  equivalence  of  the  s.c.e.  and  m.l.e. 

o ^ 

Note  that  for  given  initial  vector  ^ , the  limit  ^ is  unique  even  if 
^ is  not.  A maximum  likelihood  estimate  F of  F is  given  by 
0 if  x < q 


F(x)  = \ s1  + s2  + ...  + s^ 


if  p^j  < x < q^+1  (1  < j <m  - 1) 

if  x > pm. 


and  is  undefined  for  x e [q^,Pj]  for  1 i.  j ± ® • Therefore,  when  plotted, 
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I 

I 

1 

I 

I 

I 

I 

I 
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I 

I 

I 

I 

I 

I 

l 
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F consists  of  a series  of  m + 1 horizontal  lines  of  increasing  heights 

with  gap;  in  between,  where  the  way  in  which  increases  occur  is  arbitrary. 

% 


variances  of  the  non-zero  {s^}  are  given  by  the  inverse  of  the  matrix  of 

second  derivatives  of  L with  respect  to  the  elements  of  (s,  ,s„,...,s  ,) 

1 2 m-1 

'Xj 

corresponding  to  the  non-zero  elements  of  s . Thus  estimates  of  the 
variance  of  F(x)  can  be  calculated  for  x i C,  from  which  approximate 
standard  errors  can  be  obtained  for  the  height  of  each  horizontal  line. 


The 


i 


DISCUSSION 


Asano  (1965)  considered  the  problem  of  estimating  the  parameters  of  a 

multinomial  distribution  with  truncation  (but  no  censoring).  For  the 

"nested"  case  when  B 3B  3...  3B„  or  the  "chained"  ease  when 

x — / — N 

Bj  fl  B.  5^  4>  for  j = i-l,i,i+l  and  the  intersection  is  empty  otherwise 
(with  reordering  of  the  X^  if  necessary),  Asano  gave  explicit  expressions 
for  s.  Thus  these  two  special  cases  can  be  added  to  those  mentioned  in  Section 
1 as  being  ones  where  formulae  for  the  MLE  can  be  written  down  explicitly 
and  an  iterative  procedure  is  not  needed.  For  the  general  case,  Asano 
suggested  using  constrained  Newton-Raphson  methods.  A similar  search  method 
was  also  proposed  by  Peto  (1973)  for  the  special  case  of  interval  censoring 
only. 


However  the  Newton-Raphson  (NR)  procedure  involves  updating  a vector  of 
first  derivatives  and  the  inverse  of  a large  matrix  of  second  derivatives  of 
L at  each  stage  of  the  iteration.  This  can  be  difficult  even  for  moderate 
values  of  m.  Furthermore  the  step  size  in  the  NR  iterations  must  be  checked 
to  ensure  that  the  boundary  of  the  region  R is  not  violated.  Also  an 
improvement  at  each  stage  is  not  guaranteed  since  the  step  size  nay  be  too 
large  and  the  maximum  "overshot".  To  avoid  this,  the  likelihood  has  to  be 
calculated  and  if  it  has  decreased  the  exercise  must  be  repeated  with  a 
smaller  step  size,  and  so  on.  In  contrast  the  self-consistency  algorithm  is 
completely  automatic,  simple  to  implement  and  is  intuitively  appealing. 

In  fairness,  it  should  be  pointed  out  that  in  exceptional  cases,  the 
convergence  of  the  self-consistency  algorithm  can  be  rather  slow.  This 

»Y» 

happens  if  both  = 0 and  d^(^)  = 0 for  some  jt  i.e.  the  likelihood  has 
and  unconstrained  maximum  at  = 0.  Why  this  is  so  can  be  seen  by  Equation 
(4.7).  For  example,  suppose  m = 3,  N = 4 and  L*  = + s3^s3^si  + s2^  ' 


This  represents  the  case  of  no  truncation  and  three  intervals  with  one 
X in  the  first  interval,  one  not  in  the  first,  one  in  the  third  and  one  not 

in  the  third.  Starting  with  s?  = 1/3  (j  = 1,2,3),  we  have 

k k k k -1 

= (1  - s2)/2  and  = (3  + k)  . Hence  the  convergence  towards 

the  MLE  s = s = 1/2,  s = 0 is  quite  slow.  However  such  cases  are 

J.  W t. 

exceptional  and  usually  the  convergence  is  rapid. 
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t . APPLICATION  TO  HYPOTHESIS  TESTING 

An  important  application  of  the  MLE  T in  to  the  two  sample  problem 

where  it  in  desired  to  test  the  equality  of  two  distributions,  and  observations 

on  one  or  both  are  subject  to  arbitrary  censoring  and  truncation.  (Extension 

to  the  K sample  problem  is  immediate.)  Let  us  suppose  that  X ,...,X 

1 Ip 

is  a sample  from  Group  1 and  the  remaining  = N - N observations  are 

from  Group  2.  It  in  desired  to  test  the  null  hypothesis  HQ  that  all  N 

observations  have  the  same  under lying  f (unspecified).  The  alternative 

it  is  that  Group  1 observations  have  an  underlying  F = F.  while 
1 90 
T = F / i’  for  Group  2 observations.  We  consider  Lehmann  alternatives, 

e e o 

i.e.  Fa(x)  = G (F_  (x))  where  G is  a specified  c.d.f.  on  [0,1]  with 

9 e e0  e 

G0  (y)  i y,  while  F0  is  unspecified.  Peto  and  Peto  (1972)  have  derived 
0 0 

asymptotically  efficient  rank  invariant  tests  for  interval  censored  data,  and 
their  procedure  can  be  naturally  extended  to  the  situation  with  arbitrary 
censoring  and  truncation  as  follows. 

The  likelihood  L.  under  H„  with  F(x)  = GQ(F(x))  of  the  i'th 

l 0 0 

observation  represented  by  the  pair  (A^,B  ) is 


L. 

l 


m „ m 

l gCj.o.^a^/  l g(j,e,^)B.. 
j=i  3 3=i  3 


where  g(j,0,s)  = G.(s.  + ...  + s.)  - G (s.  + ...  + s.  ,)  and  {a..}, 

’v  0 1 3 0 1 3-1  13 

{ 0 . .}  are  defined  as  before.  An  efficient  score  for  the  i'th  observation 
il 

is  given  by  U.  = 3 log  L./30L  , i.e. 

1 10  =0- 


1 HUi)an 

u.  = *-=■ 

1 m 

\ s.a.. 

j = l 3 ^ 


i f(D'’i.)6i 


\ 13 


m 

l 


j=i 
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whi'r.'  g(j,;;)  = an/^O 


and  we  have  used  the  fact  that  g(j,0  ,s) 

0 = 0 q ' ' u *\> 


II  we  assume  that,  under  H , the  censor  ini’  and  truncation  mechanism  is  random 

0 

and  independent  of  group  membership,  a test  of  any  given  size  can  be  constructed 

using  the  permutational  distribution  of  ) , U.. 

1 '-group  1 i 

The  test  statistic  may  not  be  unique  if  s is  not  unique.  If  this 
situation  occurs,  for  small  samples  the  test  statistic  can  be  evaluated  for 

A 

"extreme"  values  of  the  possible  s and  if  the  decision  concerning  acceptance 
or  rejection  of  II  is  the  same  there  is  no  difficulty.  In  large  samples  • 
non-uniqueness  of  s is  less  likely  to  occur  and  if  it  does  the  difference 
between  significance  levels  for  the  different  values  of  s will  be  small. 

The  above  discussion  assumes  that  the  same  random  censoring  mechanism  is 
operating  in  each  group.  Tests  that  do  not  make  this  assumption  have  been 
proposed  by  Efron  (1967)  for  right  censored  data  and  Mantel  (1967)  for 
arbitrarily  censored  data,  (with  no  truncation).  Efron  uses  the  MLE's  of  the 
distributions  of  the  two  groups  calculated  separately  to  derive  an  estimate 
of  the  probability  that  an  X-value  from  group  1 is  greater  than  one  from 
group  2 and  this  is  used  as  a test  statistic.  In  theory  the  test  could  be. 


easily  extended  to  arbitrary  censoring,  however  it  is  difficult  to  compute  the 
sampling  distributions  involved.  Mantel  (1967,  Section  7)  describes  a test 
ior  arbitrarily  censored  data  which  is  a generalisation  of  that  of  Gehan 
(1965)  and  does  not  use  the  estimated  c.d.f.  A disadvantage  of  this  test 
is  that  it  requires  knowledge,  not  always  available,  of  the  entire  pattern  of 
restriction  for  each  observation  even  if  it  is  exact.  Also  much  of  the 
information  in  the  data  is  unused  and  thus  the  test  will  be  rather  inefficient. 


2 


7 . CONCLUSION 

The*  definition  of  self -consistent  estimates  does  not  directly  involve 
the  livelihood  function  and  so  their  exact  coincidence  with  the  MLE’s  is  an 
aesthetic  and  perhaps  unexpected  result.  The  property  was  first  proved  for 
sinr.lv  censored  data  by  Efron  (1‘367)  and  for  doubly  censored  data  by 
Turnbull  (1974).  However  their  methods  were  lengthier  and  involved  converting 
the  likelihood  equations  into  the  defining  equations  for  self-consistency  rather 
than  the  examination  of  successive  values  of  the  estimates  given  by  the 
algorithm. 

An  alternative  nonparametric  approach  is  to  estimate  the  hazard 
rate  associated  with  F rather  than  F itself.  The  work  of  several  authors 
is  summarised  by  Barlow  (1968,  Section  3).  Usually  the  hazard  rate  is  assumed 
to  be  a step  function,  constant  within  each  interval,  the  set  of  intervals 
being,  fixed.  For  instance,  Harris,  Meier  and  Tukey  (1950)  treat  an  interval 
censoring  situation  and  use  a similar  "prorating"  idea  as  the  basis  for  an 
iterative  scheme  for  obtaining  approximate  MLE’s  of  the  hazard  rates  in  the 
various  intervals. 

Consistency  and  other  large  sample  properties  of  the  maximum  likelihood 

estimate  F will  depend  on  the  censoring  and  truncation  mechanism.  Consider 

the  case  when  the  range  of  X is  finite,  (t,  ,t„,. . . ,t  } say,  and  when  the 

12  m 

mechanisms  are  random  as  described  in  Section  1.  If  we  suppose  that  the  sets 

B and  the  partitions  with  non-zero  probability  are  such  that  conditions  (A), 

(B)  as  stated  in  Section  2 do  not  occur  for  N sufficiently  large,  then 

q.  = p.  = t.  (i  < j < m)  and  s is  unique  - again  for  N sufficiently 
did  — — 'V 

large.  Then  consistency  and  asymptotic  normality  of  the  MLE’s  of  the 
non-zero  follow  from  the  standard  theory,  regarding  the  pairs  (A.,B.) 

as  i.i.d,  random  variables  involving  a finite  number  of  parameters  including 
the  (s,i  . Large  sample  properties  of  F when  m does  not  remain  bounded 


N -► 


as 


of 


ingle 

for 


is  an  interesting  open  question.  (Results  are  known  for  the  cas 
censoring  - see  Breslow  and  Crowley  (1974).) 
the  situation  when  there  is  concomitant  information  available  for 


each  observation,  there  appears  to  be  no  natural  extension  of  the  powerful 
methods  that  Cox  (1972)  has  proposed  for  singly  censored  data.  However  in 
a recent  paper  on  regression  with  censored  data,  R.G.  Miller  (1974)  uses  F 
as  a basis  for  inference  and  thus  it  appears  that  his  methods  can  be  extended 
to  the  case  of  arbitrary  censoring  and  truncation. 

It  is  interesting  to  note  that  Sackrowitz  and  Strawderman  (1974)  have 
shown  that,  for  a wide  class  of  reasonable  loss  functions,  the  MLE  F is 
inadmissible  for  the  case  when  the  range  of  X is  finite  and  there  is 
extreme  double  censoring  (no  exact  observations).  Thus  in  general  the  MLE 
will  be  inadmissible.  However,  unless  a prior  measure  can  be  assigned  to 
the  space  of  possible  c.d.f.'s  F,  there  is  no  apparent  substitute  to  be 
preferred  to  the  MLE.  Indeed  self-consistency  provides  a justification  for 
using  maximum  likelihood  even  in  relatively  small  samples. 
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